Project Description¶
Working on behalf of a company that sells motorcycle parts, we'll dig into their data to understand their revenue streams. We will find out how much net revenue they are generating across their product lines, segregating by date and warehouse.
The company operates three warehouses in the area, selling both retail and wholesale. They offer a variety of parts and accept credit cards, cash, and bank transfer as payment methods. However, each payment type incurs a different fee.
The board of directors wants to gain a better understanding of wholesale revenue by product line, and how this varies month-to-month and across warehouses. You have been tasked with calculating net revenue for each product line and grouping results by month and warehouse. The results should be filtered so that only "Wholesale" orders are included.
They have provided you with access to their database, which contains the following table called sales
Sales¶
Column | Data type | Description |
---|---|---|
order_number |
VARCHAR |
Unique order number. |
date |
DATE |
Date of the order, from June to August 2021. |
warehouse |
VARCHAR |
The warehouse that the order was made from— North , Central , or West . |
client_type |
VARCHAR |
Whether the order was Retail or Wholesale . |
product_line |
VARCHAR |
Type of product ordered. |
quantity |
INT |
Number of products ordered. |
unit_price |
FLOAT |
Price per product (dollars). |
total |
FLOAT |
Total price of the order (dollars). |
payment |
VARCHAR |
Payment method—Credit card , Transfer , or Cash . |
payment_fee |
FLOAT |
Percentage of total charged as a result of the payment method. |
Lets Find out how much Wholesale net revenue each product_line generated per month per warehouse in the dataset.¶
# load the dataset
import pandas as pd
df = pd.read_csv('sales.csv', usecols=lambda column: column != "index")
df.head()
order_number | date | warehouse | client_type | product_line | quantity | unit_price | total | payment | payment_fee | |
---|---|---|---|---|---|---|---|---|---|---|
0 | N1 | 2021-06-01T00:00:00.000 | North | Retail | Braking system | 9 | 19.29 | 173.61 | Cash | 0.00 |
1 | N2 | 2021-06-01T00:00:00.000 | North | Retail | Suspension & traction | 8 | 32.93 | 263.45 | Credit card | 0.03 |
2 | N3 | 2021-06-01T00:00:00.000 | North | Wholesale | Frame & body | 16 | 37.84 | 605.44 | Transfer | 0.01 |
3 | N4 | 2021-06-01T00:00:00.000 | North | Wholesale | Suspension & traction | 40 | 37.37 | 1494.80 | Transfer | 0.01 |
4 | N5 | 2021-06-01T00:00:00.000 | North | Retail | Frame & body | 6 | 45.44 | 272.61 | Credit card | 0.03 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 order_number 1000 non-null object 1 date 1000 non-null object 2 warehouse 1000 non-null object 3 client_type 1000 non-null object 4 product_line 1000 non-null object 5 quantity 1000 non-null int64 6 unit_price 1000 non-null float64 7 total 1000 non-null float64 8 payment 1000 non-null object 9 payment_fee 1000 non-null float64 dtypes: float64(3), int64(1), object(6) memory usage: 78.3+ KB
# Convert 'date' column to datetime object
df['date'] = pd.to_datetime(df['date'])
# Extracting month from date, since we want to analyze the revenue per month as well
df['month'] = df['date'].dt.strftime('%b')
# Creating the net_revenue column
df['net_revenue'] = df['total'] - df['payment_fee']
df.head()
order_number | date | warehouse | client_type | product_line | quantity | unit_price | total | payment | payment_fee | month | net_revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | N1 | 2021-06-01 | North | Retail | Braking system | 9 | 19.29 | 173.61 | Cash | 0.00 | Jun | 173.61 |
1 | N2 | 2021-06-01 | North | Retail | Suspension & traction | 8 | 32.93 | 263.45 | Credit card | 0.03 | Jun | 263.42 |
2 | N3 | 2021-06-01 | North | Wholesale | Frame & body | 16 | 37.84 | 605.44 | Transfer | 0.01 | Jun | 605.43 |
3 | N4 | 2021-06-01 | North | Wholesale | Suspension & traction | 40 | 37.37 | 1494.80 | Transfer | 0.01 | Jun | 1494.79 |
4 | N5 | 2021-06-01 | North | Retail | Frame & body | 6 | 45.44 | 272.61 | Credit card | 0.03 | Jun | 272.58 |
# Filter only Wholesale client_type
df_wholesale = df[df['client_type'] == 'Wholesale']
# Group by product_line, month, warehouse and sum net_revenue
revenue_by_product_line = df_wholesale.groupby(['product_line', 'month', 'warehouse'], as_index=False).agg({'net_revenue': 'sum'})
# Sort by product_line, month, and net_revenue in descending order
revenue_by_product_line = revenue_by_product_line.sort_values(by=['product_line', 'month', 'net_revenue'], ascending=[True, True, False])
# Display the result
revenue_by_product_line
product_line | month | warehouse | net_revenue | |
---|---|---|---|---|
0 | Braking system | Aug | Central | 3039.41 |
2 | Braking system | Aug | West | 2500.67 |
1 | Braking system | Aug | North | 1770.84 |
3 | Braking system | Jul | Central | 3778.65 |
5 | Braking system | Jul | West | 3060.93 |
4 | Braking system | Jul | North | 2594.44 |
6 | Braking system | Jun | Central | 3684.89 |
7 | Braking system | Jun | North | 1487.77 |
8 | Braking system | Jun | West | 1212.75 |
10 | Electrical system | Aug | North | 4721.12 |
9 | Electrical system | Aug | Central | 3126.43 |
11 | Electrical system | Aug | West | 1241.84 |
12 | Electrical system | Jul | Central | 5577.62 |
13 | Electrical system | Jul | North | 1710.13 |
14 | Electrical system | Jul | West | 449.46 |
15 | Electrical system | Jun | Central | 2904.93 |
16 | Electrical system | Jun | North | 2022.50 |
17 | Engine | Aug | Central | 9528.71 |
18 | Engine | Aug | North | 2324.19 |
19 | Engine | Jul | Central | 1827.03 |
20 | Engine | Jul | North | 1007.14 |
21 | Engine | Jun | Central | 6548.85 |
22 | Frame & body | Aug | Central | 8657.99 |
23 | Frame & body | Aug | North | 7898.89 |
24 | Frame & body | Aug | West | 829.69 |
26 | Frame & body | Jul | North | 6154.61 |
25 | Frame & body | Jul | Central | 3135.13 |
27 | Frame & body | Jun | Central | 5111.34 |
28 | Frame & body | Jun | North | 4910.12 |
29 | Frame & body | Jun | West | 2779.74 |
31 | Miscellaneous | Aug | North | 1841.40 |
30 | Miscellaneous | Aug | Central | 1739.76 |
32 | Miscellaneous | Aug | West | 813.43 |
33 | Miscellaneous | Jul | Central | 3118.44 |
34 | Miscellaneous | Jul | North | 2404.65 |
35 | Miscellaneous | Jul | West | 1156.80 |
38 | Miscellaneous | Jun | West | 2280.97 |
36 | Miscellaneous | Jun | Central | 1878.07 |
37 | Miscellaneous | Jun | North | 513.99 |
39 | Suspension & traction | Aug | Central | 5416.70 |
40 | Suspension & traction | Aug | North | 4923.69 |
41 | Suspension & traction | Aug | West | 1080.79 |
42 | Suspension & traction | Jul | Central | 6456.72 |
43 | Suspension & traction | Jul | North | 3714.28 |
44 | Suspension & traction | Jul | West | 2939.32 |
46 | Suspension & traction | Jun | North | 8065.74 |
45 | Suspension & traction | Jun | Central | 3325.00 |
47 | Suspension & traction | Jun | West | 2372.52 |
Lets visualize the findiings to understand it better¶
import matplotlib.pyplot as plt
import seaborn as sns
# Set the figure size
plt.figure(figsize=(12, 6))
# Create a bar plot
sns.barplot(
data=revenue_by_product_line,
x="product_line",
y="net_revenue",
hue="warehouse",
errorbar=None # Replaces ci=None
)
# Rotate x-axis labels for better readability
plt.xticks(rotation=45)
# Add labels and title
plt.xlabel("Product Line")
plt.ylabel("Net Revenue")
plt.title("Net Revenue by Product Line and Warehouse (Wholesale Only)")
# Show legend
plt.legend(title="Warehouse")
# Display the plot
plt.show()
df
order_number | date | warehouse | client_type | product_line | quantity | unit_price | total | payment | payment_fee | month | net_revenue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | N1 | 2021-06-01 | North | Retail | Braking system | 9 | 19.29 | 173.61 | Cash | 0.00 | Jun | 173.61 |
1 | N2 | 2021-06-01 | North | Retail | Suspension & traction | 8 | 32.93 | 263.45 | Credit card | 0.03 | Jun | 263.42 |
2 | N3 | 2021-06-01 | North | Wholesale | Frame & body | 16 | 37.84 | 605.44 | Transfer | 0.01 | Jun | 605.43 |
3 | N4 | 2021-06-01 | North | Wholesale | Suspension & traction | 40 | 37.37 | 1494.80 | Transfer | 0.01 | Jun | 1494.79 |
4 | N5 | 2021-06-01 | North | Retail | Frame & body | 6 | 45.44 | 272.61 | Credit card | 0.03 | Jun | 272.58 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
995 | W176 | 2021-08-27 | West | Retail | Electrical system | 4 | 21.47 | 85.89 | Credit card | 0.03 | Aug | 85.86 |
996 | W177 | 2021-08-28 | West | Wholesale | Miscellaneous | 32 | 25.42 | 813.44 | Transfer | 0.01 | Aug | 813.43 |
997 | W178 | 2021-08-28 | West | Retail | Electrical system | 6 | 28.37 | 170.21 | Credit card | 0.03 | Aug | 170.18 |
998 | W179 | 2021-08-28 | West | Wholesale | Braking system | 32 | 10.03 | 320.96 | Transfer | 0.01 | Aug | 320.95 |
999 | W180 | 2021-08-28 | West | Wholesale | Electrical system | 12 | 32.80 | 393.64 | Transfer | 0.01 | Aug | 393.63 |
1000 rows × 12 columns
Lets compute total revenue, total net revenue, and average revenue per order¶
# Total revenue and net revenue
total_revenue = df["total"].sum()
total_net_revenue = df["net_revenue"].sum()
avg_revenue_per_order = df["net_revenue"].mean()
print(f"Total Revenue: ${total_revenue:,.2f}")
print(f"Total Net Revenue: ${total_net_revenue:,.2f}")
print(f"Average Revenue per Order: ${avg_revenue_per_order:,.2f}")
Total Revenue: $289,113.00 Total Net Revenue: $289,090.98 Average Revenue per Order: $289.09
df['month'].value_counts()
month Jul 345 Jun 338 Aug 317 Name: count, dtype: int64
Lets see the Revenue Trends Over Time¶
# Group by month and sum net revenue
monthly_revenue = df.groupby("month")["net_revenue"].sum().reset_index()
# Sort months in order, we have data for three months only
month_order = ["Jun", "Jul", "Aug" ]
monthly_revenue["month"] = pd.Categorical(monthly_revenue["month"], categories=month_order, ordered=True)
monthly_revenue = monthly_revenue.sort_values("month")
# Plot
plt.figure(figsize=(10, 5))
sns.lineplot(x="month", y="net_revenue", data=monthly_revenue, marker="o", linewidth=2)
plt.title("Monthly Revenue Trend")
plt.ylabel("Net Revenue ($)")
plt.xlabel("Month")
plt.grid(True)
plt.show()
C:\Users\newbe\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True): C:\Users\newbe\anaconda3\Lib\site-packages\seaborn\_oldcore.py:1119: FutureWarning: use_inf_as_na option is deprecated and will be removed in a future version. Convert inf values to NaN before operating instead. with pd.option_context('mode.use_inf_as_na', True):
### lets analyze and visualize Payment Method Preferences
# Count of each payment method
payment_counts = df["payment"].value_counts()
# makine a Pie chart
plt.figure(figsize=(7, 7))
plt.pie(payment_counts, labels=payment_counts.index, autopct="%1.1f%%", colors=["#ff9999","#66b3ff","#99ff99"])
plt.title("Payment Method Distribution")
plt.show()
lets compare retail vs wholesale revenue¶
# Group by client_type
client_revenue = df.groupby("client_type")["net_revenue"].sum().reset_index()
# Bar plot
plt.figure(figsize=(8, 5))
sns.barplot(x="client_type", y="net_revenue", data=client_revenue, palette="coolwarm")
plt.title("Retail vs Wholesale Revenue")
plt.ylabel("Net Revenue ($)")
plt.xlabel("Client Type")
plt.show()
Analyzing the warehouse performance¶
# Group by warehouse
warehouse_revenue = df.groupby("warehouse")["net_revenue"].sum().reset_index()
# Using Bar plot
plt.figure(figsize=(10, 5))
sns.barplot(x="warehouse", y="net_revenue", data=warehouse_revenue, palette="viridis")
plt.title("Revenue by Warehouse")
plt.ylabel("Net Revenue ($)")
plt.xlabel("Warehouse")
plt.show()